Quantitative Traits
Preliminaries
If you are not already familiar with the structure of these exercises, read the Introduction first.
Reminder: Save your work regularly.
If you are using a Mac, we recommend that you use either Chrome or Firefox to complete these exercises. Some of the default settings in Safari prevent these exercises from running.
Contact information
If you have questions about these exercises, please contact Dr. Kevin Middleton (middletonk@missouri.edu) or drop by Tucker 224.
Learning objectives
The learning objectives for this exercise are:
- Explain how polygenic traits differ from Mendelian traits
- Explain how quantitative traits with continuous-valued phenotypic measures result from the combined effects of many different genes
- Describe how many genes can each contribute a small amount to a measurable phenotype
- Explain what quantitative trait loci (QTL) are and how QTL are identified
- Explain how the contributions of many genes of small effect can be associated with a disease or condition
Contrasting Mendelian traits and polygenic traits
The first phenotypes that you learned about as well as those described in the first in this series of exercises (Transmission of Genetic Information) were Mendelian traits. In Mendelian traits, a single gene is responsible for a single trait. In this context, you also learned about dominant and recessive alleles (and their variations), which lead to different observable phenotypes.
This set of exercises focuses on phenotypes that are determined by multiple genes: polygenic traits. Polygenic traits often (but not always) can be measured on a numeric scale and are thus referred to as quantitative traits. The two main types of quantitative traits are:
- Meristic traits: Traits that take on integer values such as the number of peas in a pod. 3, 4, and 5 are all possible values, but a pod can’t have 3.5 peas.
- Continuous-valued traits: Traits that can take on any number on the number line. For example lengths and weights can have any value depending on the scale (e.g., grams, millimeters, etc.)
One major difference between Mendelian and polygenic traits is that we have to alter our thinking from a dominant vs. recessive framework for Mendelian traits to one of just considering alternative alleles at a single locus in the genome for polygenic traits. We’ll come back to this idea later.
For now, let’s think about all the different ways that alleles from different genes can be combined with one another when gametes are formed.
Counting the ways that alleles can combine
The 1:2:1 genotypic ratio and 3:1 phenotypic ratio in the heterozygote cross example for a dominant Mendelian trait (Figure 1) represent theoretical probabilities for the distributions of genotypes and phenotypes.
Higher level crosses, with two and three genes can be done manually, but very quickly you will find that keeping track of all the combinations becomes very challenging (Figure 2).
Punnett squares with more than 3 genes are extremely difficult. For example, a cross with 4 genes requires 256 combinations and 1,024 combinations are required for 5 genes (a 32 x 32 grid).
Fortunately, rather than keeping track of all these combinations manually, we can just calculate them directly using a little bit of math.
Flipping coins
Before we think about combining many alleles, let’s consider something simpler: flipping coins. Imagine that you flip a coin twice. Each flip is independent of the other (i.e., heads or tails on one flip does not predispose the next flip to be heads or tails or the next).
Because each flip can result in either heads or tails, with two coin flips there are four possibilities:
- Heads, Heads
- Heads, Tails
- Tails, Heads
- Tails, Tails
To get two heads or two tails, both coin flips have to be the same. But to get one head and one tail, there are two possibilities:
- Heads, Tails
- Tails, Heads
Because the sequence doesn’t matter, both result in one of each (head and tail). If we then think about adding up the possible sets of results, we have three possibilities but 4 ways to arrive at them:
- 2 Heads (1 way)
- 1 Head, 1 Tail (2 ways)
- 2 Tails (1 way)
If you look back up at the monohybrid cross above, you will find that there is 1 DD, 2 Dd, and 1 dd. This 1:2:1 genotypic ratio is the same as for our coin flipping example.
All the different ways that you can arrive at a count of heads from a set of coin tosses is represented by a number called the Binomial Coefficient. The equation for the binomial coefficient for the number of heads from a set of coin flips is:
\[\frac{Flips!}{Heads! \times (Flips - Heads)!}\]
Those exclamation points (!) are the factorial function. For example, 3! = 3 x 2 x 1 = 6.
If we plug in the numbers for 1 Head from 2 Flips:
\[\frac{2!}{1! \times (2 - 1)!}\]
which reduces to:
\[\frac{2 \times 1}{1 \times (1)!}\]
which is just \(\frac{2}{1}\) or 2. Thus there are 2 ways to get 1 head from 2 coin flips according to the binomial coefficient, just like we figured out manually.
At this point, hopefully you are convinced that we can use the binomial coefficient and some math to determine the number of ways that genes can combine and that they will match up with the counts of ways that we would get by hand.
We can calculate the binomial coefficient directly using the choose() function. Run the code below to confirm that there are 1, 2 and 1 ways to get different genotypes. n represents the number of alleles (2 times the number of genes, because each individual receives 2 copies of each gene) and k represents the number of D alleles that any one individual receives.
Feel free to change the values for n (the number of alleles) and k (the number of D alleles) and try out some additional combinations. In general, n should greater than or equal to k, or you will have zero ways to get a particular combination1
From coin flips to independent alleles
Recall that we stipulated that coin flip are independent of one another. The same process happens when gametes are formed: alleles are distributed independently of one another.
We can replaces heads and tails, with different alleles like D and d above. One difference is that, for quantitative traits, instead of thinking about dominant vs. recessive alleles, we usually just consider alternate alleles at a certain position in the genome.
For this example, we will call these alternate alleles A and T. For a single gene, any individual could have AA, AT, or TT. Figure 3 shows these combinations.
If you run the code below, it will generate a figure like the one above, but for all the combinations for a trihybrid cross with 3 genes, like the Punnett square example shown above (Figure 2). Notice how the grid above is 8 by 8, giving 64 possible combinations. The total combinations below is also 64 (but with a lot less manual accounting for all the possible combinations).
Because there are three genes, there are 6 possible alleles to be either A or T. Something to keep in mind is that when there is more than one gene, the A’s and T’s are in different genes, so each is independent of the others. In reality, the A’s and T’s could be any of A, T, G, and C. The math works out the same, but the accounting is easier if we just think of them as all either A or T. Very often geneticists don’t even worry about what the nucleotide is when thinking about quantitative traits. They will instead only think about which allele is more common at a given position in the genome (the “major allele”) and which is less common (the “minor allele”).
The choose() function below will calculate the number of way to get 3 A and 3T.
Try to change the value of k to get the number of ways to get 0, 1, 2, 4, 5, and 6 A. Check them against the plot above.
We can calculate all the ways for 0 through 6 A alleles (0:6) and then add them up with sum():
This is where the 64 total combinations in the figure comes from. Now let’s move to more genes.
Predict what the distribution of possible genotype combinations would be when four genes are involved. In general, what will the shape of the distribution look like? Will there be more or less total possible ways than for 3 genes? How many more or less?
Execute the code block below to generate the plot.
How does the plot compare to your prediction?
Return to the code block above and continue increasing the number of genes: 5, 6, 7, 8, 10, 15, 20, 50, 100, 200, 3002. Run the code each time to regenerate the plot.
What happens to the number of combination as the number of genes increases?
For any number of genes, what is the most probable count of A’s and T’s?
What happen to the percent of rare combinations (e.g., all A or all T) as the number of genes increases?
Combinations of alleles for a set of genes, where the data take the form of “one or the other” (heads or tails, major allele or minor allele) follows what is called a binomial distribution. Thinking back to the case of a single gene with 4 possible combinations and three different possible genotypes (TT, AT, and TT). One interesting feature of binomial distributions is that as the number of “chances” (i.e., the number of genes or alleles) increases, the distribution starts to take on a characteristic shape. When there are less than about 10 genes the distribution appears stepped. But as the number approaches 20, it starts getting smoother and smoother.
The characteristic shape that a binomial distribution for many genes is called a “bell curve” and is also the shape of one of the most common distributions in biology: the normal distribution. A normal distribution has a single peak at the center and decreases down gradually moving away from the peak, to very small probabilities far away from the center.
Quantitative traits result from combinations of many alleles
So far we have built up from simple Mendelian traits to distributions of many alleles taking on the shape of a normal distribution. How can we make the leap from combinations of alleles to quantitative traits?
The solution is to assign a small positive or negative value to each allele, and the size of that value depends on many factor. In essence, we can imagine that any quantitative trait has a baseline value which is modified up or down by the presence an allele. By counting the numbers of “positive” and “negative” alleles, we can arrive at a phenotypic measurement.
To do this, we have to make some assumptions:
- All genes have roughly equal effects (no genes have more impact on the phenotype than any others)
- All genes act additively, so that we can count alleles to arrive at a phenotype. Additivity can mean adding negative numbers though.
- Genes do not interact with one another (pleiotropy) or with the environment (genotype by environment interactions)
- Our simulation accounts for all of the phenotypic variation in a trait
In real world biological systems, none of these assumptions are completely met to one degree or another. Nonetheless, we can use this framework to begin to understand quantitative traits.
Alleles to quantitative traits
Let’s return to the allelic combination plot for 5 genes (Figure 4). There are over 1,000 possible combinations of alleles, but only 10 possible resulting genotypes, from 0A/10T to 10A/0T. The most likely combination is 5 A and 5 T.
For this simulation “experiment”, we will use a common plant mode organism: Arabidopsis thaliana (a relative of mustard; Figure 5). Arabidopsis is commonly used to study the genetics of quantitative traits in plants, because it grows quickly and easily in a greenhouse, where the environment can be easily controlled.
A common measure of the amount of growth in plants is above ground biomass. Plants, such as Arabidopsis are allowed to grow in controlled conditions. After a certain period of time, the plants are harvested, dried, and weighed.
Imagine that under certain conditions, the mean above ground biomass of Arabdopsis is 5000 mg (i.e., 5 g) and that 5 genes control the range of biomass. Each T results in 20 mg lower biomass, and each A results in 50 mg higher biomass.
So a plant with 10 T and no A would weigh 5000 - (10 * 50) = 4500 mg. Each of the 10 T’s subtracts 50 mg from 5000. Similarly, a plant with 10 A’s would weight 5500 mg.
What would a plant with 5T and 5A weigh? Briefly explain your reasoning.
If we randomly sampled Aradopsis, we would expect to find plants with genotypes matching the expected distribution in Figure 4.
What do you predict the range of above ground biomass in Arabidopsis would be?
If we were to weight a large number of Arabidopsis plants, we would find a distribution that looks like Figure 6.
Case study: the distribution of human height
The National Health and Nutrition Examination Survey (“NHANES”) began in the early 1960’s and continues to this day. The goal is to assess the health and nutrition status of a broad cross-section of the population. As part of this study, routine measurements of body size such as height (in cm) are recorded for each participant.
The 2017-2020 NHANES survey has data for 13,137 individuals.
Figure 7 shows…
Generating a normal distribution from combinations of alleles
Figure 9 FIXME
Try increasing numbers of genes (e.g., 10, 20, 50, 100, 300).
As the number of genes increases, how do the distributions of actual heights and simulated heights compare to one another? How the amount of phenotypic variation attributable to each allele change as the number of genes controbuting to height increases?
Describing distributions
Using distributions
Associating QTLs with genetic variants
Intro SNP
Shapiro pigeon example (dominant trait)
Human Mendelian diseases are “easy” to identify.
QTL for Human Height
Estimated to be ~700 explaining ~16% of variation in 2010 (Lango Allen et al. 2010)
- Best understood quantitative trait in humans
- Yet still 700 genes
- Largest GWAS to date involves ~700,000 individuals described by Yengo et al. (2018)
- 3290 (“near-independent”) SNPs explain ~25% of the phenotypic variation in human height among a sample of Europeans
Case Study: Threshold traits
Schizophrenia (~200 genes)
Why family history is one of the most important diagnostic tools in medicine
Feedback
FIXME
References
Footnotes
As a side note, the binomial coefficient calculation is also used to calculate probabilities for lotteries (the Powerball odds of winning is 1 in
choose(69, 5) * choose(26, 1)) and for poker hands (the odds of a royal flush in 5-card draw is 4 inchoose(52, 5)or equivalently one inchoose(52, 5) / 4), among other games of chance. In fact, much of the basics of probability were originally developed in the 17th and 18th centuries to try to understand (and cheat at) various forms of gambling.↩︎With more than 500 there are more combinations than the computer can keep track of (about 10308), so the plot has a maximum of 500↩︎